g
y
g
g
[
,
] and to investigate how gastrectomy impacts on gastric cancer
based on profiling faecal microbiome and metabolome
ntari, et al., 2020].
e definition and working principle of LDA
the nth observation is represented by a vector ܠ and the label of
oted by ݕ. In terms of protease cleavage pattern discovery, ܠ
de, which is labelled by ݕ as either cleaved or non-cleaved. The
tion label for this type of data is binary. Normally a non-cleaved
is labelled by a zero, i.e., ݕൌ0 and a cleaved peptide ܠ is
by a one, i.e., ݕൌ1. A general format of a classification model
below,
ݕොൌ݂ሺܠ, ܟሻ
(3.1)
is a vector of model parameters, ݂ is a classification function, ݕො
iction corresponding to ݕ. In a well-constructed classifier, ݕො
e a numerical value close to zero if ݕൌ0 and ݕො should be a
l value close to one if ݕൌ1. If a classification problem is linear,
cation model can be formulated as below,
ൌܟ௧ܠൌݔଵݓଵݔଶݓଶ⋯ݔௗݓௗ↦ݕ
(3.2)
corresponds to the ith independent variable of vector ܠ and ݓ
r the ith weight in ܟ, which is used to weigh the contribution of
A vector-matrix format of a linear classifier is formulated as
here X is an input matrix and ܡො is an output vector
ܡොൌ܆௧ܟ
(3.3)
major part of LDA is to find the best projection direction to map a
ensional genotype space (X) to a one-dimensional phenotype
. To make a LDA model work, the density of ܡො is required to be
Only when this bimodality is maximised, should the projection
or the model parameters w be considered as an optimal solution